NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

ScisTree2 enables large-scale inference of cell lineage trees and genotype calling using efficient local search

https://doi.org/10.1101/gr.280542.125

Zhang, Haotian; Zhang, Yiming; Gao, Teng; Wu, Yufeng (September 2025, Genome Research)

In a multicellular organism, cell lineages share a common evolutionary history. Knowing this history can facilitate the study of development, aging, and cancer. Cell lineage trees represent the evolutionary history of cells sampled from an organism. Recent developments in single-cell sequencing have greatly facilitated the inference of cell lineage trees. However, single-cell data are sparse and noisy, and the size of single-cell data is increasing rapidly. Accurate inference of cell lineage tree from large single-cell data is computationally challenging. In this paper, we present ScisTree2, a fast and accurate cell lineage tree inference and genotype calling approach based on the infinite-sites model. ScisTree2 relies on an efficient local search approach to find optimal trees. ScisTree2 also calls single-cell genotypes based on the inferred cell lineage tree. Experiments on simulated and real biological data show that ScisTree2 achieves better overall accuracy while being significantly more efficient than existing methods. To the best of our knowledge, ScisTree2 is the first model-based cell lineage tree inference and genotype calling approach that is capable of handling datasets from tens of thousands of cells or more.
more » « less
Free, publicly-accessible full text available September 3, 2026
Causally Modeling the Linguistic and Social Factors that Predict Email Response

https://doi.org/10.18653/v1/2025.naacl-long.594

Xu, Yinuo; Chen, Hong; Rakshit, Sushrita; Ananthasubramaniam, Aparna; Yadav, Omkar; Zheng, Mingqian; Jiang, Michael; Zhang, Lechen; Yi, Bowen; Alkiek, Kenan; et al (January 2025, Association for Computational Linguistics)

Full Text Available
A general approach for inferring the ancestry of recent ancestors of an admixed individual

https://doi.org/10.1073/pnas.2316242120

Zhang, Yiming; Zhang, Haotian; Wu, Yufeng (January 2024, Proceedings of the National Academy of Sciences)

The genome of an individual from an admixed population consists of segments originated from different ancestral populations. Most existing ancestry inference approaches focus on calling these segments for the extant individual. In this paper, we present a general ancestry inference approach for inferring recent ancestors from an extant genome. Given the genome of an individual from a recently admixed population, our method can estimate the proportions of the genomes of the recent ancestors of this individual that originated from some ancestral populations. The key step of our method is the inference of ancestors (called founders) right after the formation of an admixed population. The inferred founders can then be used to infer the ancestry of recent ancestors of an extant individual. Our method is implemented in a computer program called PedMix2. To the best of our knowledge, there is no existing method that can practically infer ancestors beyond grandparents from an extant individual’s genome. Results on both simulated and real data show that PedMix2 performs well in ancestry inference.
more » « less
Full Text Available
Reliable and Secure Deep Learning-Based OFDM-DCSK Transceiver Design Without Delivery of Reference Chaotic Sequences

https://doi.org/10.1109/TVT.2022.3175968

Zhang, Haotian; Zhang, Lin; Jiang, Yuan; Wu, Zhiqiang (August 2022, IEEE Transactions on Vehicular Technology)

Full Text Available
An extensive study on pre-trained models for program understanding and generation

https://doi.org/10.1145/3533767.3534390

Zeng, Zhengran; Tan, Hanzhuo; Zhang, Haotian; Li, Jing; Zhang, Yuqun; Zhang, Lingming (July 2022, International Symposium on Software Testing and Analysis)

Full Text Available
One size does not fit all: security hardening of MIPS embedded systems via static binary debloating for shared libraries

https://doi.org/10.1145/3503222.3507768

Zhang, Haotian; Ren, Mengfei; Lei, Yu; Ming, Jiang (February 2022, In Proceedings of the 27th International Conference on Architectural Support for Programming Languages and Operating Systems)

Full Text Available
Spotting Temporally Precise, Fine-Grained Events in Video

https://doi.org/10.1007/978-3-031-19833-5_3

Hong, James; Zhang, Haotian; Gharbi, Michaël; Fisher, Matthew; Fatahalian, Kayvon (January 2022, Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel)

We introduce the task of spotting temporally precise, fine-grained events in video (detecting the precise moment in time events occur). Precise spotting requires models to reason globally about the full-time scale of actions and locally to identify subtle frame-to-frame appearance and motion differences that identify events during these actions. Surprisingly, we find that top performing solutions to prior video understanding tasks such as action detection and segmentation do not simultaneously meet both requirements. In response, we propose E2E-Spot, a compact, end-to-end model that performs well on the precise spotting task and can be trained quickly on a single GPU. We demonstrate that E2E-Spot significantly outperforms recent baselines adapted from the video action detection, segmentation, and spotting literature to the precise spotting task. Finally, we contribute new annotations and splits to several fine-grained sports action datasets to make these datasets suitable for future work on precise spotting.
more » « less
Full Text Available
Deep Just-in-Time Defect Prediction: How Far Are We?

https://doi.org/10.1145/3460319.3464819

Zeng, Zhengran; Zhang, Yuqun; Zhang, Haotian; Zhang, Lingming (July 2021, ACM SIGSOFT International Symposium on Software Testing and Analysis)

Defect prediction aims to automatically identify potential defective code with minimal human intervention and has been widely studied in the literature. Just-in-Time (JIT) defect prediction focuses on program changes rather than whole programs, and has been widely adopted in continuous testing. CC2Vec, state-of-the-art JIT defect prediction tool, first constructs a hierarchical attention network (HAN) to learn distributed vector representations of both code additions and deletions, and then concatenates them with two other embedding vectors representing commit messages and overall code changes extracted by the existing DeepJIT approach to train a model for predicting whether a given commit is defective. Although CC2Vec has been shown to be the state of the art for JIT defect prediction, it was only evaluated on a limited dataset and not compared with all representative baselines. Therefore, to further investigate the efficacy and limitations of CC2Vec, this paper performs an extensive study of CC2Vec on a large-scale dataset with over 310,370 changes (8.3 X larger than the original CC2Vec dataset). More specifically, we also empirically compare CC2Vec against DeepJIT and representative traditional JIT defect prediction techniques. The experimental results show that CC2Vec cannot consistently outperform DeepJIT, and neither of them can consistently outperform traditional JIT defect prediction. We also investigate the impact of individual traditional defect prediction features and find that the added-line-number feature outperforms other traditional features. Inspired by this finding, we construct a simplistic JIT defect prediction approach which simply adopts the added-line- number feature with the logistic regression classifier. Surprisingly, such a simplistic approach can outperform CC2Vec and DeepJIT in defect prediction, and can be 81k X/120k X faster in training/testing. Furthermore, the paper also provides various practical guidelines for advancing JIT defect prediction in the near future.
more » « less
Full Text Available
Intelligent and Reliable Deep Learning LSTM Neural Networks-Based OFDM-DCSK Demodulation Design

https://doi.org/10.1109/TVT.2020.3022043

Zhang, Lin; Zhang, Haotian; Jiang, Yuan; Wu, Zhiqiang (December 2020, IEEE Transactions on Vehicular Technology)

Full Text Available
PatchScope: Memory Object Centric Patch Diffing

https://doi.org/10.1145/3372297.3423342

Zhao, Lei; Zhu, Yuncong; Ming, Jiang; Zhang, Yichen; Zhang, Haotian; Yin, Heng (October 2020, In Proceedings of the 27th ACM Conference on Computer and Communications Security)
null (Ed.)
Full Text Available

« Prev Next »

Search for: All records